Computer Vision Systems and Methods for Optimizing Correlation Clustering for Image Segmentation Using Benders Decomposition

ABSTRACT

Computer vision systems and methods for optimizing correlation clustering for image segmentation are provided. The system receives input data and generates a correlation clustering formulation for Benders Decomposition for optimized correlation clustering of the input data. The system optimizes the Benders Decomposition for the generated correlation clustering formulation and performs image segmentation using the optimized Benders Decomposition.

RELATED APPLICATIONS

This application claims priority to U.S. Provisional Patent ApplicationSer. No. 62/952,732 filed on Dec. 23, 2019, the entire disclosure ofwhich is hereby expressly incorporated by reference.

BACKGROUND Technical Field

The present disclosure relates generally to the field of computer visiontechnology. More specifically, the present disclosure relates tocomputer vision systems and methods for optimizing correlationclustering for image segmentation using Benders decomposition.

Related Art

Many computer vision tasks involve partitioning (clustering) a set ofobservations into unique entities. A powerful formulation for suchtasks, is that of (weighted) correlation clustering. Correlationclustering is defined on a sparse graph with real valued edge weights,where nodes correspond to observations, and weighted edges describe theaffinity between pairs of nodes. For example, in image segmentation onsuperpixel graphs, nodes correspond to superpixels, and edges indicateadjacency between the superpixels. The weight of the edge between a pairof superpixels relates to the probability, as defined by a classifier,that the two superpixels belong to the same ground truth entity. Theweight is positive if the probability is greater than ½, and negative ifthe probability is less than ½. The magnitude of the weight is afunction of the confidence of the classifier.

The correlation clustering cost function sums up the weights of theedges separating connected components, referred to as entities, in aproposed partitioning of the graph. Optimization in correlationclustering partitions the graph into entities so as to minimize thecorrelation clustering cost. Correlation clustering is appealing sincethe optimal number of entities emerges naturally as a function of theedge weights rather than requiring an additional search over some modelorder parameter describing the number of clusters (entities).

Optimization in correlation clustering is non-deterministicpolynomial-time hard (“NP-hard”) for general graphs. Common approachesfor optimization in correlation clustering, which are based on linearprogramming, do not scale easily due to large correlation clusteringproblem instances. Therefore, there is a need for computer visionsystems and methods which can accelerate optimization in correlationclustering in computer visions systems, thereby improving the ability ofcomputer vision systems to more efficiently employ an efficientmechanism for optimization in correlation clustering for domains, wheremassively parallel computation can be exploited. These and other needsare addressed by the computer vision systems and methods of the presentdisclosure.

SUMMARY

The present disclosure relates to computer vision systems and methodsfor optimizing correlation clustering for image segmentation usingBenders decomposition. The present disclosure discusses a system capableof applying Benders decomposition from operations research tocorrelation clustering for computer vision. Benders decomposition iscommonly applied in operations research to solve mixed integer linearprograms (“MILP”) that have a special, but common, block structure.Benders decomposition receives a partition of the variables in the MILPbetween a master problem and a set of subproblems. The block structurerequires that no row of the constraint matrix of the MILP containsvariables from more than one subproblem. Variables explicitly enforcedto be integral lie in the master problem.

The system achieves optimization in Benders decomposition using acutting plane algorithm. Optimization proceeds with the master problemsolving optimization over its variables, followed by solving thesubproblems in parallel, providing primal/dual solutions over theirvariables conditioned on the solution to the master problem. The dualsolutions to the subproblems provide constraints to the master problem.Optimization continues until no further constraints are added to themaster problem. The system then accelerates Benders decomposition usingthe seminal operations research technique of Magnanti-Wong Benders rows(“MWR”). The system generates MWR by solving the Benders subproblemswith a distinct objective under the hard constraint of optimalityregarding the original subproblem objective. As such, in contrast toclassic approaches to correlation clustering, the system allows formassive parallelization.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing features of the invention will be apparent from thefollowing Detailed Description of the Invention, taken in connectionwith the accompanying drawings, in which:

FIG. 1 is a diagram illustrating overall system of the presentdisclosure;

FIG. 2 is a flowchart illustrating overall process steps carried out bythe system of the present disclosure;

FIG. 3 is an algorithm showing the cutting plane approach for generatingBenders decomposition for correlation clustering, as described inconnection with FIG. 2;

FIG. 4 depicts a set of charts demonstrating the effectiveness of thepresent system with various optimal parameters for different problemdifficulties.

FIG. 5 is a graph illustrating speed increase resulting from the use ofparallelization by the system of the present disclosure;

FIG. 6 is a table showing the convergence of bounds for differentoptimal parameters of the present disclosure;

FIG. 7 depicts an algorithm showing the serial rounding procedure of thepresent disclosure; and

FIG. 8 is a diagram illustrating sample hardware and software componentscapable of being used to implement the system of the present disclosure.

DETAILED DESCRIPTION

The present disclosure relates to computer vision systems and methodsfor optimizing correlation clustering for image segmentation usingBenders decomposition, as described in detail below in connection withFIGS. 1-8. By way of background and before describing the systems andmethods of the present disclosure in detail, the standard correlationclustering formulation will be discussed first.

The standard formulation for correlation clustering corresponds to agraph partitioning problem with respect to graph G′=(

,

). This problem is defined by Equations 1 and 2, as follows:

$\begin{matrix}{{{\min\limits_{\underset{\forall{{d_{1}d_{2}} \in ɛ}}{x_{d_{1}d_{2}} \in {\{{0,\; 1}\}}}}{\sum\limits_{{d_{1}d_{2}} \in \; ɛ^{-}}{- {\varphi_{d_{1}d_{2}}\left( {1 - x_{d_{1}d_{2}}} \right)}}}} + {\sum\limits_{{d_{1}d_{2}} \in ɛ^{+}}{\varphi_{d_{1}d_{2}}x_{d_{1}d_{2}}}}}\;} & {{Equation}\mspace{14mu} 1} \\{\mspace{79mu} {{s.t.\mspace{14mu} {\sum\limits_{{d_{1}d_{2}} \in ɛ_{c}^{+}}x_{d_{1}d_{2}}}} \geq {x_{d_{1}^{c}d_{2}^{c}}\mspace{31mu} {\forall{c \in C}}}}} & {{Equation}\mspace{14mu} 2}\end{matrix}$

Where the variables are defined as:

-   -   d∈        : The set of nodes in the graph, which correlation clustering is        applied on, is denoted        and indexed by d.    -   (d₁, d₂)∈        : The set of undirected edges in the graph, which correlation        clustering is applied on is denoted        , and indexed by nodes d₁, d₂. The graph described by        is sparse for real problems.    -   x_(d1d2)∈{0, 1}: x_(d1d2)=1 to indicate that nodes d₁, d₂ are in        separate components, and zero otherwise. (d₁, d₂) is referred to        as an edge, where x_(d1d2)=1 as a cut edge.    -   φ_(d1d2)∈        : ϕ_(d1d2) denotes the weight associated with edge (d₁, d₂).        ⁺,        ⁻ denotes the subsets of        , for which ϕ_(d1d2) is non-negative, and negative respectively.    -   c∈C: C denotes the set of (undirected) cycles of edges in        , each of which contains exactly one member of        ⁻. C is indexed with c.    -   (d^(c) ₁, d^(c) ₂): (d^(c) ₁, d^(c) ₂) denotes the only edge in        ⁻ associated with cycle c.    -   _(c) ⁺:        _(c) ⁺ denotes the subset of        ⁺ associated with the cycle c.

The objective in Equation 1 describes the total weight of the cut edges.The constraints described in Equation 2 enforce the standard relaxationof correlation clustering, which requires that transitivity regardingassociation of nodes with components be respected. Equation 2 can beexpressed with the following example. Any cycle of edges “c” containsexactly one edge in (d^(c) ₁, d^(c) ₂)∈

⁻. Equation 2 states that if edge (d^(c) ₁, d^(c) ₂) is cut, then atleast one other edge must be cut on the cycle. If this constraint isviolated, this means d^(c) ₁, d^(c) ₂ are in separate components (sincex_(c) ^(c) ₁ d ^(c) ₂=1), and that all nodes on the cycle are in thesame component (since x_(d1d2)=0 for all (d₁, d₂)∈

_(c) ⁺) creating a contradiction.

The constraints in Equation 2 are referred to as cycle inequalities.Solving Equation 1 is intractable due to the large number of cycleinequalities. To attack such problems, prior art systems iterate betweensolving an integer linear program (“ILP”) over a nascent set ofconstraints Ĉ (initialized empty), and adding new constraints from theset of currently violated cycle inequalities. Generating constraintscorresponds to iterating over (d₁, d₂)∈

⁻, and identifying the shortest path between the d₁, d₂ in the graphwith edges

, and weights equal to the vector x. If the corresponding path has totalweight less than x_(d1d2), then the corresponding constraint is added toĈ. The linear program relaxation of Equations 1 and 2 can be solvedinstead of the ILP in each iteration until no violated cycleinequalities exist, after which the ILP is solved in each iteration.

It is noted that in prior art systems, correlation clustering forcomputer vision did not require that cycle inequalities contain exactlyone member of

⁻, which is on the right hand side of Equation 2. The addition of cycleinequalities, that contain edges in

⁻,

⁺ on the left hand side, right hand side of Equation 2, respectively, donot tighten the ILP in Equation 1 and 2 or its linear programrelaxation.

The system of the present disclosure reformulates optimization in theILP to admit efficient optimization via Benders decomposition. Bendersdecomposition is an exact MLP programming solver, but can be intuitivelyunderstood as a coordinate descent procedure, iterating between themaster problem, and the subproblems. Solving the subproblems not onlyprovides a solution for their variables, but also a lower bound in theform of a hyper-plane over the master problem's variables. The lowerbound is tight at the current solution to the master problem.

This formulation is defined by a minimal vertex cover on

⁻, with members N⊂D indexed by n. Each n∈N is associated with a Benderssubproblem, and is referred to as the root of that Benders subproblem.Edges in

⁻ are partitioned arbitrarily between the subproblems, such that each(d₁, d₂)∈

⁻ is associated with either the subproblem with root d₁ or thesubproblem with root d₂. For example,

_(n) ⁻ is the subset of

⁻ associated with subproblem n. The subproblem with root n enforces thecycle inequalities C_(n), where C_(n) is the subset of C containingedges in

_(n) ⁻.

_(n) ⁺ denotes the subset of

⁺ adjacent to n. Byway of example, the system assumes that N isprovided. However, those skilled in the art would understand that N canbe produced greedily or using an LP/ILP (linear program/integer linearprogram) solver.

FIG. 1 is a diagram illustrating the system of the present disclosure,indicated generally at 10. The system 10 includes a correlationclustering optimization engine 12 which receives input data 14,processes the data, and generates output data 16 for use in connectionwith image segmentation. Specifically, as discussed above, correlationclustering is defined on a sparse graph with real valued edge weights,where nodes correspond to observations, and weighted edges describe theaffinity between pairs of nodes. Optimization in correlation clusteringpartitions the graph into entities to minimize the correlationclustering cost. The correlation clustering optimization engine 12accelerates the optimization for correlation clustering, as will bedescribed in further detail below.

FIG. 2 is a flowchart illustrating the overall process steps beingcarried out by the system 10, indicated generally at method 20. In step22, the system 10 applies an auxiliary function to the optimizationformulation for correlation clustering. The auxiliary function can berepresented as Q(ϕ, n, x), which provides the cost to alter x to satisfyall cycle inequalities in C_(n), by increasing/decreasing x_(d1d2) for(d₁, d₂) in

⁺/

_(n) ⁻, respectively. Equations 3 and 4 below describe the changes to xusing x^(n), which is indexed as x.

$\begin{matrix}{{{Eq}\mspace{14mu} 1} = {{\min\limits_{x_{d_{1}d_{2}} \in {\{{0,\; 1}\}}}{\sum\limits_{{d_{1}d_{2}} \in ɛ^{-}}{- {\varphi_{d_{1}d_{2}}\left( {1 - x_{d_{1}d_{2}}} \right)}}}} + {\sum\limits_{{d_{1}d_{2}} \in ɛ^{+}}{\varphi_{d_{1}d_{2}}x_{d_{1}d_{2}}}} + {\sum\limits_{n \in }{Q\left( {\varphi,n,\ x} \right)}}}} & {{Equation}\mspace{14mu} 3} \\{{{Where}\mspace{14mu} Q\left( {\varphi,n,x} \right)} = {{\min\limits_{x_{d_{1}d_{2}}^{n} \in {\{{0.1}\}}}{\sum\limits_{{d_{1}d_{2}} \in ɛ_{n}^{-}}{- {\varphi_{d_{1}d_{2}}\left( {1 - x_{d_{1}d_{2}}^{n}} \right)}}}} + {\sum\limits_{{d_{1}d_{2}} \in ɛ^{+}}{\varphi_{d_{1}d_{2}}x_{d_{1}d_{2}}^{n}}}}} & {{Equation}\mspace{14mu} 4} \\{\mspace{79mu} {{{s.t.{\sum\limits_{{d_{1}d_{2}} \in ɛ_{c}^{+}}x_{d_{1}d_{2}}}} + x_{d_{1}d_{2}}^{n}} \geq {x_{d_{1}^{c}d_{2}^{c}} - {\left( {1 - x_{d_{1}^{c}d_{2}^{c}}^{n}} \right)\mspace{20mu} {\forall{c \in _{n}}}}}}} & \;\end{matrix}$

In step 24, the system 10 maps {x, x^(n) ∀n∈N} to a solution {x*, x^(n)*∀n∈N}, where x* satisfies all cycle inequalities by construction,without increasing the cost according to Equation 3. The system 10defines x* as seen below in Equation 5:

$\begin{matrix}{\left. x_{d_{1}d_{2}}^{*}\leftarrow{{\min \left( {x_{d_{1}d_{2}},x_{d_{1}d_{2}}^{n}} \right)}\mspace{25mu} {\forall{\left( {d_{1},d_{2}} \right) \in ɛ_{n}^{-}}}} \right.,{n \in }} & {{Equation}\mspace{14mu} 5} \\\left. x_{d_{1}d_{A}}^{*}\leftarrow{x_{d_{1}d_{2}} + {\max\limits_{n \in }{x_{d_{1}d_{2}}^{n}\mspace{25mu} {\forall{\left( {d_{1},d_{2}} \right) \in ɛ^{+}}}}}} \right. & \;\end{matrix}$

Given x*, the optimizing solution to each Benders subproblem, n isdenoted x^(n)*, and defined as follows. The system 10 setsx^(n)*_(d1d2)=1, if (d₁, d₂)∈

_(n) ⁻, and otherwise set to zero. It is noted that cost of {x*, x^(n)*∀n∈N}, is no greater than that of {x, x^(n) ∀n∈N}, with regard to theobjective in Equation 3. It is further noted that Q(ϕ, n, x*)=0 for alln∈N. Thus there always exists an optimizing solution to Equation 3denoted x, such that Q(ϕ, n, x)=0 for all n∈i. Further, there exists anoptimal partition x^(n), in Equation 4, that is 2-colorable. This isbecause any partition x^(n), can be altered without increasing its cost,by merging adjacent connected components not including the root node n.It is noted that merging any pair of such components does not increasethe cost, since those components are not separated by negative weightedges.

In step 26, the system 10 adapts optimization in Q (ϕ, n, x). Forexample, the system 10 can use the node labeling formulation of min-cut,which is expressed by the following notation:

-   -   m_(d)=1 for d∈        : indicates that a node d is not in the component associated        with n, and is otherwise zero. To avoid extra notation m_(n) is        replaced by 0.    -   f_(d1d2)=1 for (d₁, d₂)∈        ⁺: indicates that the edge between d₁, d₂ is cut, but is not cut        in x. Thus a penalty of ϕ_(d1d2) is added to Q(ϕ, n, x). It is        observed that x^(n) _(d1 d2)=f_(d1 d2) for all (d1, d2)∈        ⁺    -   f_(d1d2)=1 for (d₁, d₂)∈        _(n) ⁻: indicates that the edge between d₁, d₂ is not cut, but        is cut in x. Thus a penalty of −ϕ_(d1 d2) is added to Q(ϕ, n,        x). Observe that x^(n) _(d1 d2)=1−f_(d1d2) for all (d₁, d₂)∈        _(n) ⁻.    -   For benefit of readability, the edges are re-oriented from        (d, n) to (n, d)

The system 10 then expresses Q(ϕ, n, x) (as expressed by Equation 6,below) as a primal/dual linear program, with primal constraintsassociated with dual variables ψ, λ, which are noted in the primal.Given a binary x, the system 10 enforces that parameters f, m arenon-negative to ensure that there is an optimizing solution for theparameter f, m that is binary. This is a consequence of optimizationbeing total unimodular, given that x is binary. Total unimodularity is aknown property of the min-cut/max flop linear program.

$\begin{matrix}{\mspace{79mu} {{{Q\left( {\varphi,n,x} \right)} = {{\min\limits_{\underset{m_{d} \geq 0}{f_{d_{1}d_{2}} \geq 0}}{\sum\limits_{{d_{1}d_{2}} \in \; ɛ^{+}}{\varphi_{d_{1}d_{2}}f_{d_{1}d_{2}}}}} - {\sum\limits_{{nd} \in ɛ_{n}^{-}}{\varphi_{nd}f_{nd}}}}}{{{m_{d_{1}} - m_{d_{2}}} \leq {x_{d_{1}d_{2}} + {f_{d_{1}d_{2}}\mspace{20mu} {\forall{\left( {d_{1},d_{2}} \right) \in \left( {ɛ^{+} - ɛ_{n}^{+}} \right)}}}}},\lambda_{d_{1}d_{2}}^{-}}}} & {{Equation}\mspace{14mu} 6} \\{{{m_{d_{2}} - m_{d_{1}}} \leq {x_{d_{1}d_{2}} + {f_{d_{1}d_{2}}\mspace{14mu} {\forall{\left( {d_{1},d_{2}} \right) \in \left( {ɛ^{+} - ɛ_{n}^{+}} \right)}}}}},\lambda_{d_{1}d_{2}}^{+}} & \; \\{\mspace{79mu} {{{x_{nd} - f_{nd}} \leq {m_{d}\mspace{25mu} {\forall{\left( {n,d} \right) \in ɛ_{n}^{-}}}}},\psi_{d}^{-}}} & \; \\{\mspace{79mu} {{m_{d} \leq {x_{nd} + {f_{nd}\mspace{25mu} {\forall{\left( {n,d} \right) \in ɛ_{n}^{+}}}}}},\psi_{d}^{+}}} & \;\end{matrix}$

In Equation 7, below, the system 10 denotes the binary indicatorfunction, which returns one if the statement is true and zero otherwise.

$\begin{matrix}{{Q\left( {\varphi,n,x} \right)} = {{\max\limits_{\underset{\psi \geq 0}{\lambda \geq 0}}{- {\sum\limits_{{d_{1}d_{2}} \in {({ɛ^{+} - ɛ_{n}^{+}})}}{\left( {\lambda_{d_{1}d_{2}}^{1} + \lambda_{d_{1}d_{2}}^{2}} \right)x_{d_{1}d_{2}}}}}} + {\sum\limits_{{nd} \in ɛ_{n}^{-}}{\psi_{d}^{-}x_{nd}}} - {\sum\limits_{{nd} \in ɛ_{n}^{+}}{\psi_{d}^{+}x_{nd}}}}} & {{Equation}\mspace{14mu} 7} \\{{{s.t.\mspace{14mu} {\psi_{d_{1}}^{+}\left\lbrack {\left( {n,d_{1}} \right) \in ɛ_{n}^{+}} \right\rbrack}} - {\psi_{d_{1}}^{-}\left\lbrack {\left( {n,d_{1}} \right) \in ɛ_{n}^{-}} \right\rbrack} + {\sum\limits_{\underset{{d_{1}d_{2}} \in {ɛ^{+} - ɛ_{n}^{+}}}{d_{2}}}\left( {\lambda_{d_{1}d_{2}}^{-} - \lambda_{d_{1}d_{2}}^{+}} \right)} + {\sum\limits_{\underset{{d_{2}d_{1}} \in {ɛ^{+} - ɛ_{n}^{+}}}{d_{2}}}\left( {\lambda_{d_{2}d_{1}}^{+} - \lambda_{d_{2}d_{1}}^{-}} \right)}} \geq {0\mspace{25mu} {\forall{d_{1} \in { - n}}}}} & \; \\{\mspace{79mu} {{{- \varphi_{nd}} - \psi_{d}^{-}} \geq {0\mspace{31mu} {\forall{\left( {n,d} \right) \in ɛ_{n}^{-}}}}}} & \; \\{\mspace{79mu} {{\varphi_{nd} - \psi_{d}^{+}} \geq {0\mspace{31mu} {\forall{\left( {n,d} \right) \in ɛ_{n}^{+}}}}}} & \; \\{\mspace{79mu} {{\varphi_{d_{1}d_{2}} - \lambda_{d_{1}d_{2}}^{1} - \lambda_{d_{1}d_{2}}^{2}} \geq {0\mspace{31mu} {\forall{\left( {d_{1},d_{2}} \right) \in {ɛ - ɛ_{n}^{+} - ɛ_{n}^{-}}}}}}} & \;\end{matrix}$

In an example, the system 10 considers the constraint that Q (ϕ, n,x)=0. It is observed that any dual feasible solution (λ, ψ in Equation7) describes an affine function of x, that is a lower bound on Q(ϕ, n,x). The system 10 compacts the terms λ and ψ into ω^(z), where ω^(z)_(d1d2) is associated with the term x_(d1d2), as expressed belowEquations 8-11:

ω_(d) ₁ _(d) ₂ ^(z)=−(λ_(d) ₁ _(d) ₂ ¹+λ_(d) ₁ _(d) ₂ ²) if (d ₁ ,d ₂)∈

⁺−

_(n) ⁺

ω_(d) ₁ _(d) ₂ ^(z)=−ψ_(d) ₂ ⁺ if (d ₁ ,d ₂)∈

_(n) ⁺

ω_(d) ₁ _(d) ₂ ^(z)=ψ_(d) ₂ ⁻ if (d ₁ ,d ₂)∈

_(n) ⁻

ω_(d) ₁ _(d) ₂ ^(z)=0 if (d ₁ ,d ₂)∈

⁻−

_(n) ⁻

-   -   Equations 8-11, respectively

In step 28, the system 10 formulates the correlation cluster.Specifically, the set of all dual feasible solutions is denoted acrossn∈N as Z, which is indexed by the term z. It is observed that to enforceQ (ϕ, n, x)=0, it is sufficient to require that 0≥Σ_(d1d2∈E)x_(d1d2)ω^(z) _(d1d2), for all z∈Z. As such, the system 10 formulatesthe correlation cluster CC as an optimization using Z, as expressedbelow in Equation 12:

$\begin{matrix}{{{Equation}\mspace{14mu} 3} = {{\min\limits_{x_{{d_{1}d_{2}} \in {\{{0,1}\}}}}{\sum\limits_{{d_{1}d_{2}} \in ɛ^{+}}{\varphi_{d_{1}d_{2}}x_{d_{1}d_{2}}}}} - {\sum\limits_{{d_{1}d_{2}} \in ɛ^{-}}{\left( {1 - x_{d_{1}d_{2}}} \right)\varphi_{d_{1}d_{2}}}}}} & {{Equation}\mspace{14mu} 12} \\{\mspace{79mu} {{s.t.\mspace{14mu} 0} \geq {\sum\limits_{{d_{1}d_{2}} \in ɛ}{x_{d_{1}d_{2}}\omega_{d_{1}d_{2}}^{z}\mspace{31mu} {\forall{z \in { -}}}}}}} & \;\end{matrix}$

It is noted that optimization in Equation 12 is intractable, since |Z|equals the number of dual feasible solutions across subproblems, whichis infinite. Since the system 10 cannot consider the entire set Z, instep 30, the system 10 uses a cutting plane approach to construct a set{circumflex over (Z)}∈Z, that is sufficient to solve Equation 12.Specifically, the system 10 initializes {circumflex over (Z)} as theempty set and iterates between solving the LP relaxation of Equation 12over {circumflex over (Z)} (referred to herein as the master problem),and generating new Benders rows until no violated constraints exist.This ensures that no violated cycle inequalities exist, but may notensure that x in integral. To enforce integrality, the system 10iterates between solving the ILP in Equation 12 over {circumflex over(Z)}, and adding Benders rows to {circumflex over (Z)}. By solving theLP relaxation first, we avoid unnecessary and expensive calls to the ILPsolver.

In step 32, the system 10 generates Benders rows. Specifically, inBenders decomposition, the variable of the original problem are dividedinto two subsets so that a first-stage master problem is solved over thefirst set variables, and the values for the second set of variables aredetermined in a second-stage subproblem for a given first-stagesolution. If the subproblem determines that the fixed first-stagedecisions are infeasible, then cuts are generated and added to themaster problem, which is then resolved until no cuts can be generated.The new constraints added by Benders decomposition as it progressestowards a solution are called Benders rows.

More specifically, given x, the system 10 iterates over N, and generateone Benders row using Equation 7, if n is associated with a violatedcycle inequality. The system 10 determines if n is associated with aviolated cycle inequality as follows. Given n, x the system 10 iteratesover (d₁, d₂)∈

_(n) ⁻. The system 10 then finds the shortest path from d₁ to d₂ ongraph

, with weights equal to vector x. If the length of this path, denoted asDist(d₁, d₂), is less than x_(d1d2), then the system 10 identified aviolated cycle inequality associated with n.

FIG. 3 is an algorithm showing the cutting plane approach of the presentdisclosure, (referred to as Benders decomposition for correlationclustering, as described above in connection to method 20. Specifically,the algorithm of FIG. 3 shows the following steps. In line 1, the system10 initializes the nascent set of Benders rows {circumflex over (Z)} toan empty set. In line 2, the system 10 indicates that the system 10 hasnot solved the LP (linear program) relaxation yet. In lines 3-17, thesystem 10 alternates between solving the master problem, and generatingBenders rows, until a feasible integral solution is produced. Morespecifically, in line 4, the system 10 solves the master problem byproviding a solution x, which may not satisfy all cycle inequalities.The system 10 enforces integrality if the system 10 finished solving theLP relaxation, which is indicated by done_lp=True. In line 5, the system10 indicates that the system 10 has not yet added any Benders rows thisiteration. In lines 6-13, the system 10 adds Benders rows by iteratingover subproblems, and adding Benders rows, corresponding to subproblems,associated with violated cycle inequalities. Specifically, In line 7,the system 10 checks if there exists a violated cycle inequalityassociated

_(n) ⁻, which is executed by iterating over (d₁, d₂)∈

_(n) ⁻, and checking if the shortest path from d₁ to d₂ is less thanx_(d1d2). This distance is defined on graph

with weights equal to x. In lines 8-10, the system 10 generates Bendersrows associated with subproblem n, and add them to nascent set{circumflex over (Z)}. In line 11, the system 10 indicates that aBenders row was added the iteration. In lines 14-16, the system 10instructs that if no Benders rows were added the iteration, then thesystem 10 enforces integrality on x, when solving the master problem forthe remainder of the algorithm. Finally, in line 18, the system 10returns solution x.

Prior to the termination of the algorithm of FIG. 3, the system 10 canproduce a feasible integer solution x*, from any solution x, provided bythe master problem. For example, for each (d₁, d₂)∈

, set x**_(d1d2)=1, if x_(d1d2)>½, and otherwise set x**_(d1d2)=0.Second for each (d₁, d₂)∈

, set x*_(d1d2)=1, if d₁, d₂ are in separate connected components of thesolution described by x**, and otherwise set x*_(d1d2)=0. The cost ofthe feasible integer solution x*, provides an upper bound on the cost ofthe optimal solution. A more sophisticated approach for producingfeasible integer solutions will be discussed below.

Returning to FIG. 2, in step 34, the system 10 accelerates theoptimization (e.g., the Benders Decomposition). In an example, thesystem 10 accelerates the Benders Decomposition using an operationresearch technique of Magnanti-Wong Benders row (“MWR”). Specifically,the Benders row, as discussed above in Equation 7, provides a tightbound at x*, where x* is the master problem solution used to generatethe Benders row. However, it is desirable that the Benders row providegood lower bounds for a large set of x different from x*, while beingtight (or perhaps very active) at x*. To achieve this, the system 10uses a modified version of Equation 7, where the objective is replaced,and one additional constraint is added.

The system 10 uses a random negative valued vector (with unit norm) inplace of the objective Equation 7. The random vector is unique each timea Benders subproblem is solved. In an example, the system 10 uses anobjective of −1/(0.0001+|ϕ_(d1d2)|), which encourages the cutting ofedges with large positive weight, but it works as well as the randomnegative objective. Here, 0.0001 is a tiny positive number used toensure that the terms in the objective do not become infinite.

The system 10 uses Equation 13, below, to enforce that the new Bendersrow is active at x*, by requiring that the dual cost is within atolerance v∈(0, 1) of the optimal with regards to the objective inEquation 7 (hereafter parameter v will be referred to as an optimalparameter).

$\begin{matrix}{{{vQ}\left( {\varphi,n,x} \right)} \leq {{- {\sum\limits_{{d_{1}d_{2}} \in {({ɛ^{+} - ɛ_{n}^{+}})}}{\left( {\lambda_{d_{1}d_{2}}^{1} + \lambda_{d_{1}d_{2}}^{2}} \right)x_{d_{1}d_{2}}}}} + {\sum\limits_{{nd} \in ɛ_{n}^{-}}{\psi_{d}^{-}x_{nd}}} - {\sum\limits_{{nd} \in ɛ_{n}^{-}}{\psi_{d}^{+}x_{nd}}}}} & {{Equation}\mspace{14mu} 13}\end{matrix}$

Specifically, v=1 requires optimality with respect to the objective inEquation 7, and v=0 ignores optimality. By way of example, v=½, providesstrong performance.

Testing and analysis of the above systems and methods will now bediscussed in greater detail. The system of the present disclosure wasapplied to on the benchmark Berkeley Segmentation Data Set (“BSDS”). Theexperiments demonstrate the following: 1) the system solves correlationclustering instances for image segmentation; 2) the system successfullyexploits parallelization; and 3) the system dramatically acceleratesoptimization.

To benchmark performance, cost terms are used provided by the OPENGM2dataset for BSDS. This allows for a direct comparison of the results ofthe system of the present disclosure to a benchmark. The present systemused the random unit norm negative valued objective when generating MWR.The present system further used the IBM ILOG CPLEX Optimization Studio(“CPLEX”) to solve all linear and integer linear programming problemsconsidered during the course of optimization. A maximum total CPU timeof 600 seconds was used, for each problem instance (regardless ofparallelization).

The selection of N was formulated as a minimum vertex cover problem,where for every edge (d₁, d₂)∈

⁻, at least one of d₁, d₂ is in N. The present system solved for theminimum vertex cover exactly as an ILP. Given N, edges are assigned in

⁻ to a connected selected node in N arbitrarily. It is noted thatsolving for the minimum vertex cover consumed negligible CPU time forthe data set. This can be attributed to the structure of the problemdomain, since the minimum vertex cover is an NP-hard problem. Forproblem instances where solving for the minimum vertex cover exactly isdifficult, the minimum vertex cover problem can be solved approximatelyor greedily.

FIG. 4 depicts a set of charts demonstrating the effectiveness of thepresent system with various optimal parameter v for different problemdifficulties. Specifically, the gap is plotted between the upper andlower bounds as a function of time for various values of v on selectedproblem instances. The symbols a, b, and c indicate v=[0.5, 0.99, 0.01],respectively, and the symbol d indicates not using Magnanti-Wong rows(“MWR”). Further, both the computation time with and without exploitingparallelization of subproblems is shown, with dotted and solid lines,respectively. Titles are used to indicate the approximate difficulty ofthe problem as ranked by input file size of 100 files.

In FIG. 4, it is observed that the system's use of MWR dramaticallyaccelerates optimization. However, the exact value of optimal parameterv does not affect the speed of optimization dramatically. Performance isshown with and without relying on parallel processing. The parallelprocessing times assume that one CPU is used for each subproblem. Forthe problem instances during the testing, the number of subproblems isunder one thousand. The parallel and non-parallel time comparisons shareonly the time to solve the master problem. Large benefits ofparallelization are observed for all settings of v. However, when MWRare not used, diminished improvement is observed, since the masterproblem consumes a larger proportion of total CPU time.

FIG. 5 is a graph showing the use of parallelization. Specifically, thebenefits of parallelization and MWR across the data set are compared.Scatter plotted is the total running time versus the total running timewhen solving each subproblem is done on its own CPU across probleminstances. Grey circles indicate v=0.5 and black circles are used toindicate when MWR was not used. A line is drawn with slope=1 to betterenable appreciation of the grey and black points.

FIG. 6 is a table showing the convergence of the bounds for v=½, 0; (v=0means that no MWR are generated). A percentage is shown of problemssolved that have a duality gap of up to tolerance ∈, within a certainamount of time (10,50,100,300) seconds, with and withoutMWR/parallelization. Par=1 is used to indicate the use ofparallelization and par=0 is used to indicate without parallelization.Further, v=0 means that no MWRs are generated. A set of tolerances isconsidered on convergence regarding the duality gap, which is thedifference between the anytime solution (upper bound), and the lowerbound on the objective. For each such tolerance f, the present systemcomputes the percentage of instances, for which the duality gap is lessthan f, after various amounts of time. It is observed that theperformance of optimization without MWR, but exploiting parallelization,performs worse than using MWR, but without parallelization. Thisdemonstrates that, across the data set, MWR is of greater importancethan parallelization.

The following demonstrates a proof that there exists an x that minimizesEquation 3, for which Q(ϕ, n, x)=0. The proof maps an arbitrary solution(x, {x^(n)∀n∈N}) to one denoted (x, {x^(n)*∀n∈N}) where Q(ϕ, n, x*)=0,without increasing the objective in Equation 3. Equation 14, below, iswritten in terms of x^(n):

$\begin{matrix}\left. x_{d_{1}d_{2}}^{*}\leftarrow{x_{d_{1}d_{2}} + {\max\limits_{n \in }{x_{d_{1}d_{2}}^{n}\mspace{31mu} {\forall{\left( {d_{1},d_{2}} \right) \in ɛ^{+}}}}}} \right. & {{Equation}\mspace{14mu} 14} \\{\left. x_{d_{1}d_{2}}^{*}\leftarrow{x_{d_{1}d_{2}} + x_{d_{1}d_{2}}^{n} - {1\mspace{31mu} {\forall{\left( {d_{1},d_{2}} \right) \in ɛ_{n}^{-}}}}} \right.,{n \in }} & \; \\\left. x_{d_{1}d_{2}}^{n*}\leftarrow{0\mspace{31mu} {\forall{\left( {d_{1},d_{2}} \right) \in ɛ^{+}}}} \right. & \; \\{\left. x_{d_{1}d_{2}}^{n*}\leftarrow{1\mspace{31mu} {\forall{\left( {d_{1},d_{2}} \right) \in ɛ_{n}^{-}}}} \right.,{n \in }} & \;\end{matrix}$

The updates in Equation 14 are equivalent to the following updates, inEquation 15, below, using f^(n), f^(n*), where f^(n), f^(n*) correspondto the optimizing solution for f in subproblem n, given x, x*respectively.

$\begin{matrix}\left. x_{d_{1}d_{2}}^{*}\leftarrow{x_{d_{1}d_{2}} + {\max\limits_{n \in }{f_{d_{1}d_{2}}^{n}\mspace{31mu} {\forall{\left( {d_{1},d_{2}} \right) \in ɛ^{+}}}}}} \right. & {{Equation}\mspace{14mu} 15} \\{\left. x_{d_{1}d_{2}}^{*}\leftarrow{x_{d_{1}d_{2}} - {f_{d_{1}d_{2}}^{n}\mspace{31mu} {\forall{\left( {d_{1},d_{2}} \right) \in ɛ_{n}^{-}}}}} \right.,{n \in }} & \; \\\left. f_{d_{1}d_{2}}^{n*}\leftarrow{0\mspace{31mu} {\forall{\left( {d_{1},d_{2}} \right) \in ɛ^{+}}}} \right. & \; \\\left. f_{d_{1}d_{2}}^{n*}\leftarrow{0\mspace{31mu} {\forall{\left( {d_{1},d_{2}} \right) \in ɛ_{n}^{-}}}} \right. & \;\end{matrix}$

It is noted that the updates in Equations 14 and 15 preserve thefeasibility of the primal LP in Equation 6. It is further noted thatsince f^(n*) is a zero valued vector for all n∈N, then Q(ϕ, n, x)=0 forall n∈N.

The total change in Equation 3 corresponding to edge (d₁, d₂)∈

⁺, induced by Equation 14, is non-positive. The objective of the masterproblem increases by ϕ_(d1d2) max_(n∈N) x^(n) _(d1d2), while the totaldecrease in the objectives of the subproblems is ϕ_(d1d2) Σ_(n∈NX) _(n)_(d1d2). Next, the total change in Equation 3 is consideredcorresponding to edge (d₁, d₂)∈

^(n), induced by Equation 14, which is zero. The objective of the masterproblem increases by −ϕ_(d1d2)(1−x^(n) _(d1d2)), while objective ofsubproblem n decreases by −ϕ_(d1d2)(1−x^(n) _(d1d2)).

The approach for producing feasible integer solutions will now bediscussed. Prior to the termination of optimization, it can be valuableto provide feasible integer solutions on demand. This is so that a usercan terminate optimization, when the gap between the objectives of theintegral solution and the relaxation is small. The production offeasible integer solutions is considered, given the current solution x*to the master problem, which may neither obey cycle inequalities or beintegral. This procedure is referred to as rounding.

Rounding is a coordinate descent approach defined on the graph

with weights κ, determined using x* as seen in Equation 16, below:

κ_(d) ₁ _(d) ₂ =ϕ_(d) ₁ _(d) ₂ (1−x* _(d) ₁ _(d) ₂ ) ∀(d ₁ ,d ₂)∈

⁺

κ_(d) ₁ _(d) ₂ =ϕ_(d) ₁ _(d) ₂ x* _(d) ₁ _(d) ₂ ∀(d ₁ ,d ₂)∈

⁻  Equation 16

It is considered that x* is integral and feasible (where feasibilityindicates that x* satisfies all cycle inequalities). Let x^(n*) definethe boundaries, in partition x*, of the connected component containingn. Here, x^(n*) _(d1id2)=1 if one of d₁, d₂ is in the connectedcomponent containing n under cut x*. It is observed that Q(κ, n,x^(n0))=0, where x^(0n) _(d1d2)=[(d₁, d₂)∈

_(n) ⁻], is achieved using x^(n*) as the solution to Equation 6. Thusx^(n*) is the minimizer of Equation 6. The union of the edges cut inx^(n*) across n∈N is identical to x*. It is observed that when x* isintegral, and feasible, then a solution is produced that has cost equalto that of x*, as seen below in Equation 17:

$\begin{matrix}\left. x^{n*}\leftarrow{{minimizer}\mspace{14mu} {of}\mspace{14mu} {Q\left( {\kappa,n,x^{0n}} \right)}\mspace{14mu} {\forall{n \in }}} \right. & {{Equation}\mspace{14mu} 17} \\\left. x_{d_{1}d_{2}}^{+}\leftarrow{\max\limits_{n \in }{x_{d_{1}d_{2}}^{n*}\mspace{31mu} {\forall{\left( {d_{1},d_{2}} \right) \in ɛ^{+}}}}} \right. & \; \\{\left. x_{d_{1}d_{2}}^{+}\leftarrow{x_{d_{1}d_{2}}^{n*}\mspace{31mu} {\forall{\left( {d_{1},d_{2}} \right) \in ɛ_{n}^{-}}}} \right.,{n \in }} & \;\end{matrix}$

The procedure of Equation 17, can be used regardless of whether x* isintegral or feasible. It is observed that if x* is close to integral andclose to feasible, then Equation 17 is biased to produce a solution thatis similar to x* by design of κ.

A serial version of Equation 17 will now be discussed, which can provideimproved results. A partition x* is constructed by iterating over n∈N,producing component partitions as in Equation 17. The term κ is alteredby allowing for the cutting of edges previously cut with cost zero. FIG.7 is an illustration of an algorithm showing the serial roundingprocedure. In Line 1, the system initialize x⁺ as the zero vector. Inlines 2-3, the system set x according to Equation 16. In line 4-8, thesystem, iterates over n∈N to construct x⁺ by cutting edges cut in thesubproblem. Specifically, in line 5, the system produces the lowest costcut x^(n) given altered edge weights κ for subproblem n. In line 6, thesystem cuts edges in x* that are cut in x^(n). In line 7, the systemsets ϕ_(d1d2) to zero for cut edges in x⁺. In line 9, the system returnsthe solution x⁺. When solving for the fast minimizer of Q(κ, n, x^(0n)),the system relies on a network flow solver.

FIG. 8 is a diagram showing a hardware and software components of acomputer system 102 on which the system of the present disclosure can beimplemented. The computer system 102 can include a storage device 104,computer vision software code 106, a network interface 108, acommunications bus 110, a central processing unit (CPU) (microprocessor)112, a random access memory (RAM) 114, and one or more input devices116, such as a keyboard, mouse, etc. The server 102 could also include adisplay (e.g., liquid crystal display (LCD), cathode ray tube (CRT),etc.). The storage device 104 could comprise any suitable,computer-readable storage medium such as disk, non-volatile memory(e.g., read-only memory (ROM), erasable programmable ROM (EPROM),electrically-erasable programmable ROM (EEPROM), flash memory,field-programmable gate array (FPGA), etc.). The computer system 102could be a networked computer system, a personal computer, a server, asmart phone, tablet computer etc. It is noted that the server 102 neednot be a networked server, and indeed, could be a stand-alone computersystem.

The functionality provided by the present disclosure could be providedby computer vision software code 106, which could be embodied ascomputer-readable program code stored on the storage device 104 andexecuted by the CPU 112 using any suitable, high or low level computinglanguage, such as Python, Java, C, C++, C#, NET, MATLAB, etc. Thenetwork interface 108 could include an Ethernet network interfacedevice, a wireless network interface device, or any other suitabledevice which permits the server 102 to communicate via the network. TheCPU 112 could include any suitable single-core or multiple-coremicroprocessor of any suitable architecture that is capable ofimplementing and running the computer vision software code 106 (e.g.,Intel processor). The random access memory 114 could include anysuitable, high-speed, random access memory typical of most moderncomputers, such as dynamic RAM (DRAM), etc.

Having thus described the system and method in detail, it is to beunderstood that the foregoing description is not intended to limit thespirit or scope thereof. It will be understood that the embodiments ofthe present disclosure described herein are merely exemplary and that aperson skilled in the art can make any variations and modificationwithout departing from the spirit and scope of the disclosure. All suchvariations and modifications, including those discussed above, areintended to be included within the scope of the disclosure.

What is claimed is:
 1. A computer vision system for optimizingcorrelation clustering comprising: a memory; and a processor incommunication with the memory, the processor: receiving input data,generating a correlation clustering formulation for BendersDecomposition for optimized correlation clustering of the input data,optimizing the Benders Decomposition for the generated correlationclustering formulation, and performing image segmentation using theoptimized Benders Decomposition.
 2. The system of claim 1, wherein theprocessor generates the correlation clustering formulation to utilizeBenders Decomposition by: applying an auxiliary function to aconventional correlation clustering formulation, the auxiliary functionbeing indicative of a cost to alter a vector of the auxiliary functionto satisfy cycle inequalities, and mapping the altered vector to asolution that satisfies the cycle inequalities without increasing a costof the auxiliary function.
 3. The system of claim 1, wherein theprocessor optimizes the Benders Decomposition via a cutting planealgorithm.
 4. The system of claim 3, wherein the Benders Decompositionincludes a master problem and a set of subproblems and the cutting planealgorithm executes optimization over the variables of the master problemand then executes optimization over the subproblems in parallel.
 5. Thesystem of claim 1, wherein the processor accelerates the BenderDecomposition utilizing Benders rows and Magnanti-Wong Benders rows. 6.The system of claim 1, wherein the dataset is a Berkeley SegmentationData Set (BSDS).
 7. A method for optimizing correlation clustering by acomputer vision system, comprising the steps of: receiving input data;generating a correlation clustering formulation for BendersDecomposition for optimized correlation clustering of the input data;optimizing the Benders Decomposition for the generated correlationclustering formulation; and performing image segmentation using theoptimized Benders Decomposition.
 8. The method of claim 7, furthercomprising the steps of generating the correlation clusteringformulation to utilize Benders Decomposition by: applying an auxiliaryfunction to a conventional correlation clustering formulation, theauxiliary function being indicative of a cost to alter a vector of theauxiliary function to satisfy cycle inequalities; and mapping thealtered vector to a solution that satisfies the cycle inequalitieswithout increasing a cost of the auxiliary function.
 9. The method ofclaim 7, further comprising the step of optimizing the BendersDecomposition via a cutting plane algorithm.
 10. The method of claim 9,wherein the Benders Decomposition includes a master problem and a set ofsubproblems and the cutting plane algorithm executes optimization overthe variables of the master problem and then executes optimization overthe subproblems in parallel.
 11. The method of claim 7, furthercomprising the step of accelerating the Bender Decomposition utilizingBenders rows and Magnanti-Wong Benders rows.
 12. The method of claim 7,wherein the dataset is a Berkeley Segmentation Data Set (BSDS).
 13. Anon-transitory computer readable medium having instructions storedthereon for optimizing correlation clustering by a computer visionsystem which, when executed by a processor, causes the processor tocarry out the steps of: receiving input data; generating a correlationclustering formulation for Benders Decomposition for optimizedcorrelation clustering of the input data; optimizing the BendersDecomposition for the generated correlation clustering formulation; andperforming image segmentation using the optimized Benders Decomposition.14. The non-transitory computer readable medium of claim 13, theprocessor further carrying out the steps of generating the correlationclustering formulation to utilize Benders Decomposition by: applying anauxiliary function to a conventional correlation clustering formulation,the auxiliary function being indicative of a cost to alter a vector ofthe auxiliary function to satisfy cycle inequalities; and mapping thealtered vector to a solution that satisfies the cycle inequalitieswithout increasing a cost of the auxiliary function.
 15. Thenon-transitory computer readable medium of claim 13, the processorfurther carrying out the step of optimizing the Benders Decompositionvia a cutting plane algorithm.
 16. The non-transitory computer readablemedium of claim 15, wherein the Benders Decomposition includes a masterproblem and a set of subproblems and the cutting plane algorithmexecutes optimization over the variables of the master problem and thenexecutes optimization over the subproblems in parallel.
 17. Thenon-transitory computer readable medium of claim 13, the processorfurther carrying out the step of accelerating the Bender Decompositionutilizing Benders rows and Magnanti-Wong Benders rows.
 18. Thenon-transitory computer readable medium of claim 13, wherein the datasetis a Berkeley Segmentation Data Set (BSDS).